Web Page Prediction Based on Conditional Random Fields

نویسندگان

  • Yong Zhen Guo
  • Kotagiri Ramamohanarao
  • Laurence Anthony F. Park
چکیده

Web page prefetching is used to reduce the access latency of the Internet. However, if most prefetched Web pages are not visited by the users in their subsequent accesses, the limited network bandwidth and server resources will not be used efficiently and even worsen the access delay problem. Therefore, enhancing theWeb page prediction accuracy is a main problem ofWeb page prefetching. Conditional Random Fields (CRFs), which are popular sequential learning models, have already been successfully used for many Natural Language Processing (NLP) tasks such as POS tagging, name entity recognition (NER) and segmentation. In this paper, we propose the use of CRFs in the field of Web page prediction. We treat the accessing sessions of previous Web users as observation sequences and label each element of these observation sequences to get the corresponding label sequences, then based on these observation and label sequences we use CRFs to train a prediction model and predict the probable subsequent Web pages for the current users. Our experimental results show that CRFs can produce higher Web page prediction accuracy effectively when compared with other popular techniques like plain Markov Chains and Hidden Markov Models (HMMs).

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Grouped ECOC Conditional Random Fields for Prediction of Web User Behavior

Web page prefetching has shown to provide reduction in Web access latency, but is highly dependent on the accuracy of the Web page prediction method. Conditional Random Fields (CRFs) with Error Correcting Output Coding (ECOC) have shown to provide highly accurate and efficient Web page prediction on large-size websites. However, the limited class information provided to the binary-label sub-CRF...

متن کامل

Web Access Latency Reduction Using CRF-Based Predictive Caching

Reducing the Web access latency perceived by a Web user has become a problem of interest. Web prefetching and caching are two effective techniques that can be used together to reduce the access latency problem on the Internet. Because the success of Web prefetching mainly relies on the prediction accuracy of prediction methods, in this paper we employ a powerful sequential learning model, Condi...

متن کامل

Victor: the Web-Page Cleaning Tool

In this paper we present a complete solution for automatic cleaning of arbitrary HTML pages with a goal of using web data as a corpus in the area of natural language processing and computational linguistics. We employ a sequence-labeling approach based on Conditional Random Fields (CRF). Every block of text in analyzed web page is assigned a set of features extracted from the textual content an...

متن کامل

Web Page Cleaning with Conditional Random Fields

This paper describes the participation of the Charles University in Cleaneval 2007, the shared task and competitive evaluation of automatic systems for cleaning arbitrary web pages with the goal of preparing web data for use as a corpus in the area of computational linguistics and natural language processing. We try to solve this task as a sequence-labeling problem and our experimental system i...

متن کامل

Tree-Structured Conditional Random Fields for Semantic Annotation

The large volume of web content needs to be annotated by ontologies (called Semantic Annotation), and our empirical study shows that strong dependencies exist across different types of information (it means that identification of one kind of information can be used for identifying the other kind of information). Conditional Random Fields (CRFs) are the state-of-the-art approaches for modeling t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008